AITopics

Country:

Asia > Singapore (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications (0.68)

Chua, Jia Yun, Zolotas, Argyrios, Arana-Catania, Miguel

Efficient Few-Shot Learning in Remote Sensing: Fusing Vision and Vision-Language Models

arXiv.org Artificial IntelligenceOct-17-2025

Remote sensing has become a vital tool across sectors such as urban planning, environmental monitoring, and disaster response. While the volume of data generated has increased significantly, traditional vision models are often constrained by the requirement for extensive domain-specific labelled data and their limited ability to understand the context within complex environments. Vision Language Models offer a complementary approach by integrating visual and textual data; however, their application to remote sensing remains underexplored, particularly given their generalist nature. This work investigates the combination of vision models and VLMs to enhance image analysis in remote sensing, with a focus on aircraft detection and scene understanding. The integration of YOLO with VLMs such as LLaVA, ChatGPT, and Gemini aims to achieve more accurate and contextually aware image interpretation. Performance is evaluated on both labelled and unlabelled remote sensing data, as well as degraded image scenarios which are crucial for remote sensing. The findings show an average MAE improvement of 48.46% across models in the accuracy of aircraft detection and counting, especially in challenging conditions, in both raw and degraded scenarios. A 6.17% improvement in CLIPScore for comprehensive understanding of remote sensing images is obtained. The proposed approach combining traditional vision models and VLMs paves the way for more advanced and efficient remote sensing image analysis, especially in few-shot learning scenarios.

large language model, machine learning, vlm, (22 more...)

2510.13993

Country: Asia > Japan (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 12:56:23 GMT

From Chaos to Clarity: 3DGS in the Dark

Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation.

noise, proceedings, raw image, (15 more...)

Country:

Asia > Singapore (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications (0.68)

Neural Information Processing SystemsOct-10-2025, 09:51:11 GMT

92dd1adab39f362046f99dfe3c39d90f-Paper-Conference.pdf

gaussian, le3d, representation, (15 more...)

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Media > Photography (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)

Dorise, Adrien, Bellizzi, Marjorie, Girard, Adrien, Francesconi, Benjamin, May, Stéphane

Explaining raw data complexity to improve satellite onboard processing

arXiv.org Artificial IntelligenceOct-10-2025

With increasing processing power, deploying AI models for remote sensing directly onboard satellites is becoming feasible. However, new constraints arise, mainly when using raw, unprocessed sensor data instead of preprocessed ground-based products. While current solutions primarily rely on preprocessed sensor images, few approaches directly leverage raw data. This study investigates the effects of utilising raw data on deep learning models for object detection and classification tasks. We introduce a simulation workflow to generate raw-like products from high-resolution L1 imagery, enabling systemic evaluation. Two object detection models (YOLOv11n and YOLOX-S) are trained on both raw and L1 datasets, and their performance is compared using standard detection metrics and explainability tools. Results indicate that while both models perform similarly at low to medium confidence thresholds, the model trained on raw data struggles with object boundary identification at high confidence levels. It suggests that adapting AI architectures with improved contouring methods can enhance object detection on raw images, improving onboard AI for remote sensing.

artificial intelligence, deep learning, machine learning, (18 more...)

2510.06858

Country: Europe > France (0.29)

Genre: Research Report > New Finding (0.47)

Industry:

Transportation > Marine (0.93)
Energy (0.89)
Transportation > Freight & Logistics Services > Shipping (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceJan-31-2025

Single cell resolution 3D imaging and segmentation within intact live tissues

Paci, G., Vicente-Munuera, P., Fernandez-Mosquera, I., Miranda, A., Lau, K., Zhang, Q., Barrientos, R., Mao, Y.

Epithelial cells form diverse structures from squamous spherical organoids to densely packed pseudostratified folded tissues. Quantification of cellular properties in these contexts requires high-resolution deep imaging and computational techniques to achieve truthful threedimensional (3D) structural features. Here, we describe a detailed step-by-step protocol for sample preparation, imaging and deep-learning-assisted cell segmentation to achieve accurate quantification of fluorescently labelled individual cells in 3D within live tissues. We share the "lessons learned" through troubleshooting 3D imaging of Drosophila wing discs, including considerations on the choice of microscopy modality and settings (objective, sample mounting) and available segmentation methods. In addition, we include a computational pipeline alongside custom code to assist replication of the protocol. While we focus on the segmentation of cell outlines from membrane labelling, this protocol applies to a wide variety of samples, and we believe it will be valuable for studying other tissues that demand complex analysis in 3D.

artificial intelligence, machine learning, segmentation, (19 more...)

2501.19203

Country:

Oceania > Fiji (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Neural Information Processing SystemsJan-13-2025, 20:18:26 GMT

Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

We introduce Embed to Control (E2C), a method for model learning and control of non-linear dynamical systems from raw pixel images. E2C consists of a deep generative model, belonging to the family of variational autoencoders, that learns to generate image trajectories from a latent space in which the dynamics is constrained to be locally linear. Our model is derived directly from an optimal control formulation in latent space, supports long-term prediction of image sequences and exhibits strong performance on a variety of complex control problems.

artificial intelligence, linear latent dynamic model, machine learning, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.33)

Huang, Sida, Zhang, Hongyuan, Li, Xuelong

Enhance Vision-Language Alignment with Noise

arXiv.org Artificial IntelligenceDec-16-2024

With the advancement of pre-trained vision-language (VL) models, enhancing the alignment between visual and linguistic modalities in downstream tasks has emerged as a critical challenge. Different from existing fine-tuning methods that add extra modules to these two modalities, we investigate whether the frozen model can be fine-tuned by customized noise. Our approach is motivated by the scientific study of beneficial noise, namely Positive-incentive Noise (Pi-noise or $\pi$-noise) , which quantitatively analyzes the impact of noise. It therefore implies a new scheme to learn beneficial noise distribution that can be employed to fine-tune VL models. Focusing on few-shot classification tasks based on CLIP, we reformulate the inference process of CLIP and apply variational inference, demonstrating how to generate $\pi$-noise towards visual and linguistic modalities. Then, we propose Positive-incentive Noise Injector (PiNI), which can fine-tune CLIP via injecting noise into both visual and text encoders. Since the proposed method can learn the distribution of beneficial noise, we can obtain more diverse embeddings of vision and language to better align these two modalities for specific downstream tasks within limited computational resources. We evaluate different noise incorporation approaches and network architectures of PiNI. The evaluation across 11 datasets demonstrates its effectiveness.

artificial intelligence, machine learning, natural language, (17 more...)

2412.10817

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Dominican Republic (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceOct-28-2024

Deep Learning-Based Fatigue Cracks Detection in Bridge Girders using Feature Pyramid Networks

Zhang, Jiawei, Li, Jun, Ly, Reachsak, Liu, Yunyi, Shu, Jiangpeng

For structural health monitoring, continuous and automatic crack detection has been a challenging problem. This study is conducted to propose a framework of automatic crack segmentation from high-resolution images containing crack information about steel box girders of bridges. Considering the multi-scale feature of cracks, convolutional neural network architecture of Feature Pyramid Networks (FPN) for crack detection is proposed. As for input, 120 raw images are processed via two approaches (shrinking the size of images and splitting images into sub-images). Then, models with the proposed structure of FPN for crack detection are developed. The result shows all developed models can automatically detect the cracks at the raw images. By shrinking the images, the computation efficiency is improved without decreasing accuracy. Because of the separable characteristic of crack, models using the splitting method provide more accurate crack segmentations than models using the resizing method. Therefore, for high-resolution images, the FPN structure coupled with the splitting method is an promising solution for the crack segmentation and detection.

artificial intelligence, machine learning, prediction, (19 more...)

2410.21175

Country:

North America > United States (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-2-2024

XNet v2: Fewer Limitations, Better Results and Greater Universality

Zhou, Yanfeng, Li, Lingrui, Wang, Zichen, Liu, Guole, Liu, Ziwen, Yang, Ge

XNet introduces a wavelet-based X-shaped unified architecture for fully- and semi-supervised biomedical segmentation. So far, however, XNet still faces the limitations, including performance degradation when images lack high-frequency (HF) information, underutilization of raw images and insufficient fusion. To address these issues, we propose XNet v2, a low- and high-frequency complementary model. XNet v2 performs wavelet-based image-level complementary fusion, using fusion results along with raw images inputs three different sub-networks to construct consistency loss. Furthermore, we introduce a feature-level fusion module to enhance the transfer of low-frequency (LF) information and HF information. XNet v2 achieves state-of-the-art in semi-supervised segmentation while maintaining competitve results in fully-supervised learning. More importantly, XNet v2 excels in scenarios where XNet fails. Compared to XNet, XNet v2 exhibits fewer limitations, better results and greater universality. Extensive experiments on three 2D and two 3D datasets demonstrate the effectiveness of XNet v2. Code is available at https://github.com/Yanfeng-Zhou/XNetv2 .

hf information, information, segmentation, (15 more...)

2409.00947

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)